17 research outputs found
SeGAN: Segmenting and Generating the Invisible
Objects often occlude each other in scenes; Inferring their appearance beyond
their visible parts plays an important role in scene understanding, depth
estimation, object interaction and manipulation. In this paper, we study the
challenging problem of completing the appearance of occluded objects. Doing so
requires knowing which pixels to paint (segmenting the invisible parts of
objects) and what color to paint them (generating the invisible parts). Our
proposed novel solution, SeGAN, jointly optimizes for both segmentation and
generation of the invisible parts of objects. Our experimental results show
that: (a) SeGAN can learn to generate the appearance of the occluded parts of
objects; (b) SeGAN outperforms state-of-the-art segmentation baselines for the
invisible parts of objects; (c) trained on synthetic photo realistic images,
SeGAN can reliably segment natural images; (d) by reasoning about occluder
occludee relations, our method can infer depth layering.Comment: Accepted to CVPR18 as spotligh
Structure from Action: Learning Interactions for Articulated Object 3D Structure Discovery
Articulated objects are abundant in daily life. Discovering their parts,
joints, and kinematics is crucial for robots to interact with these objects. We
introduce Structure from Action (SfA), a framework that discovers the 3D part
geometry and joint parameters of unseen articulated objects via a sequence of
inferred interactions. Our key insight is that 3D interaction and perception
should be considered in conjunction to construct 3D articulated CAD models,
especially in the case of categories not seen during training. By selecting
informative interactions, SfA discovers parts and reveals initially occluded
surfaces, like the inside of a closed drawer. By aggregating visual
observations in 3D, SfA accurately segments multiple parts, reconstructs part
geometry, and infers all joint parameters in a canonical coordinate frame. Our
experiments demonstrate that a single SfA model trained in simulation can
generalize to many unseen object categories with unknown kinematic structures
and to real-world objects. Code and data will be publicly available
Phone2Proc: Bringing Robust Robots Into Our Chaotic World
Training embodied agents in simulation has become mainstream for the embodied
AI community. However, these agents often struggle when deployed in the
physical world due to their inability to generalize to real-world environments.
In this paper, we present Phone2Proc, a method that uses a 10-minute phone scan
and conditional procedural generation to create a distribution of training
scenes that are semantically similar to the target environment. The generated
scenes are conditioned on the wall layout and arrangement of large objects from
the scan, while also sampling lighting, clutter, surface textures, and
instances of smaller objects with randomized placement and materials.
Leveraging just a simple RGB camera, training with Phone2Proc shows massive
improvements from 34.7% to 70.7% success rate in sim-to-real ObjectNav
performance across a test suite of over 200 trials in diverse real-world
environments, including homes, offices, and RoboTHOR. Furthermore, Phone2Proc's
diverse distribution of generated scenes makes agents remarkably robust to
changes in the real world, such as human movement, object rearrangement,
lighting changes, or clutter.Comment: https://allenai.org/project/phone2pro